Using LSA to Automatically Identify Givenness and Newness of Noun Phrases in Written Discourse
نویسندگان
چکیده
Identifying given and new information within a text has long been addressed as a research issue. However, there has previously been no accurate computational method for assessing the degree to which constituents in a text contain given versus new information. This study develops a method for automatically categorizing noun phrases into one of three categories of givenness/newness, using the taxonomy of Prince (1981) as the gold standard. The central computational technique used is span (Hu et al., 2003), a derivative of latent semantic analysis (LSA). We analyzed noun phrases from two expository and two narrative texts. Predictors of newness included span as well as pronoun status, determiners, and word overlap with previous noun phrases. Logistic regression showed that span was superior to LSA in categorizing noun-phrases, producing an increase in accuracy from 74% to 80%.
منابع مشابه
Newness and Givenness of Information : Automated Identification in Written Discourse
The identification of new versus given information within a text has been frequently investigated by researchers of language and discourse. Despite theoretical advances, an accurate computational method for assessing the degree to which a text contains new versus given information has not previously been implemented. This study discusses a variety of computational new/given systems and analyzes...
متن کاملInformational Status and Pitch Accent Distribution in Spontaneous Dialogues in English
Revealing the relations between pitch accent types and the informational status of words requires a refined discourse analysis of spontaneous speech. A cooperative unscripted task in which subjects gave instructions for decorating Christmas trees successfully induced production of target adjective-noun pairs conveying new/given and contrastive information. Adapting Grosz and Sidner’s intention-...
متن کاملCorpus-Based Identification of Non-Anaphoric Noun Phrases
Coreference resolution involves finding antecedents for anaphoric discourse entities, such as definite noun phrases. But many definite noun phrases are not anaphoric because their meaning can be understood from general world knowledge (e.g., "the White House" or "the news media"). We have developed a corpus-based algorithm for automatically identifying definite noun phrases that are non-anaphor...
متن کاملCorpus - Based Identi cation of Non - Anaphoric NounPhrasesDavid
Coreference resolution involves nding antecedents for anaphoric discourse entities, such as deenite noun phrases. But many deenite noun phrases are not anaphoric because their meaning can be understood from general world knowledge (e.g., \the White House" or \the news media"). We have developed a corpus-based algorithm for automatically identifying deenite noun phrases that are non-anaphoric, w...
متن کاملUse of Articles in Learning English as a Foreign Language: A Study of Iranian English Undergraduates
The significance of error analysis for the learner, the teacher and the researcher is now widely recognized. Earlier studies of error analysis concentrated on intersystematic comparison of the “native language” and the “target language” and drew the required data largely from intuitions and impressionistic observations. This study was conducted on the basis of the following observations: (1) to...
متن کامل